Background:

In clinical practice, interpreting novel germline variants in bone marrow failure (BMF) and hematologic malignancy (HM) predisposition genes requires extensive literature review, often consuming hours searching through scattered publications and databases with incomplete information. This fragmentation impairs variant classification, genetic counseling, and therapeutic decision-making. The St. Jude BMF Team developed an interactive platform (stjude.org/BMFgenes) to consolidate expert-curated genotypes and phenotypes. The objective is to enable rapid variant interrogation in clinical context across select BMF/HM disorders.

Methods:

We systematically curated germline variants from 464 publications and institutional datasets (capturing genomic coordinates, cDNA/protein annotations, gnomAD frequencies, demographics, age at diagnosis, hematologic features, and constitutional symptoms). We created a new platform (stjude.org/BMFgenes) built on the open-source ProteinPaint codebase (Zhou, PMID: 26711108; Matt, PMID: 38593228), allowing for interactive data visualization with direct source access and future support of community data submission, curation and cataloging from worldwide BMF researchers.

Results:

The initial release (v1.0) encompasses 1,931 cases across 7 core genes (GATA2, GATA1, SAMD9, SAMD9L, CEBPA, DKC1, SH2B3). Key results for each gene and associated disease are: GATA2 deficiency (1,070 cases, 329 variants: 65% null, 31% missense, 2% intronic regulatory); GATA1-related cytopenia (100 cases, 19 variants, median age 3 years, acquired trisomy 21 found in 4 cases); SAMD9 and SAMD9L syndromes (247 cases total, 44% and 36% had monosomy 7, respectively), CEBPA-associated familial AML (103 cases, 32 variants, median age 26 years, 95% had AML); DKC1-related telomere disorders (375 cases, 101 variants, 76 cases had all symptoms of mucocutaneous triad); SH2B3-associated hereditary myeloproliferative disorder (34 cases, 26 variants, median age 4 months). Additional new BMF/HM predisposition genes will be added to future database iterations.

To demonstrate the analytical capabilities and implications in clinical practice of this newly established platform, we performed in-depth interrogation of GATA2 and SH2B3 cohorts. For GATA2 (stjude.org/GATA2), the majority of patient variants (96%, 315/329) were absent from gnomAD controls, consistent with high disease penetrance (93%, 994/1070). Among symptomatic patients, 70% developed MDS/AML at a young age (median 17 years), but remarkably, MDS/AML was rare under age 6 and virtually absent before age 3 years. This finding has implications for surveillance strategies, sparing unnecessary procedures in young children. Genotype-phenotype analysis revealed distinctions for variant type: null variants associated with near complete disease penetrance (97%), and earlier onset of MDS/AML compared to other variants (16 vs 21 years, p<0.0001). Conversely, cases carrying intron 4 variants had reduced penetrance (72%), suggesting a spectrum of clinical impact. Hazard ratio calculations uncovered increased MDS/AML risk for GATA2 null variants (p<0.005) and reduced risk effect for intron 4 variants (p<0.05). We next used the dataset to interrogate trajectories of clonal evolution. Monosomy 7/der(1;7) was the predominant abnormal karyotype (37%), followed by trisomy 8 (16%). Co-occurring somatic mutations in SETBP1, ASXL1, and STAG2 showed variable association with monosomy 7 (85%, 40%, and 26% of these cases had co-occurring monosomy 7, respectively).

For SH2B3 (stjude.org/SH2B3), our data aggregation across 11 studies revealed a broad phenotypic spectrum of this novel myeloproliferative disorder (MPD) predisposition. The majority of reported cases presented during infancy (59%, 28/34). Clinical presentations included JMML/JMML-like disorder (12/34, 35%) and neonatal MPD (9/34, 26%). Adult-onset MPD occurred in 8 patients (24%), while autoimmune disorders affecting multiple organs were observed in 10 patients. This analysis establishes SH2B3 as a myeloproliferative/autoimmune disorder spanning the age spectrum.

Conclusions

The platform facilitates rapid evaluation of both published and unpublished cases and enables exploration of variant distributions across protein domains, phenotype-based filtering, and identification of genotype-phenotype correlations. We also facilitate direct submission of new cases.

This content is only available as a PDF.
Sign in via your Institution